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Driver identification is a momentous field of modern decorated vehicles in the 
perspective of the controller area network (CAN-Bus). Many conventional 
systems are used to identify the driver. One step ahead, most of the researchers 
use sensor data of CAN-Bus but there are some difficulties because of the 
variation of a protocol of different models of vehicle. We aim to identify the 
driver through supervised learning algorithms based on driving behavior 
analysis. To identify the driver, a driver verification technique is proposed that 
evaluate driving pattern using the measurement of CAN sensor data. In this 
paper on-board diagnostic (OBD-ID) is used to capture the data from CAN- 
Bus sensor and the sensors are listed under SAE J1979 statement. According 
to the service of OBD-II drive identification is possible. However, we have 
gained two types of accuracy on a full data set with 10 drivers and a partial 
data set with two drivers. The accuracy is good with less number of drivers 
compared to a higher number of drivers. We have achieved statistically 


significant results in terms of accuracy in contrast to the baseline algorithm. 
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1. INTRODUCTION 

Every driver has their driving style, therefore the driver can be classified according to exploration 
through the driving pattern analysis. It is to be considered as a fingerprint of the driver's manner like 
acceleration, speed, and braking habits that vary from driver to driver. Driver fingerprinting could lead to 
important privacy compromises [1]. 

Today we cannot consider just a vehicle as a modern car, as it is a fully decorated smart device with 
various functions like multimedia, security system, and different sensors [2]. At most three sensors named fuel 
level, coolant temperature, and oil pressure were furnished last century until the 70th year. The sensors were 
very simple because the driver was informed regarding the features of the engine and the amount of fuel through 
the magnetoelectric and light display devices [2]. Nowadays, cars are equipped with many microcomputers. 
Information technology is developing rapidly and cars are connected to the internet. Using state-of-the-art 
technology in real-time all the microcomputers are communicated to each other through controller area network 
(CAN-Bus) [3]. As a result, the drivers feel secure and joyful during their trips and all other equipment is 
functioning properly. 

To make a car more efficient a good number of technologies are used in the modern engine. To 
improve the engine performance direct injection technology was introduced in the modern car [4]. According 
to a survey, the researcher predicted that the number of sales of connected cars will reach 76.3 million in the 
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next 2023 [5]. Soon technology-based connected cars will make a digital platform where a multitude of sensors 
will take place like radar, light detection and ranging (LIDAR), cameras, ultrasonic sensors, and vehicle motion 
sensors [5]. Through state-of-the-art technology, modern engines use less fuel and besides get more power [6]. 
Most of the cars have partnered with other components which are highly technology-based, such as traffic 
lights, garage doors, and services [7]. Cars on the dashboard have green lights that indicate the drivers’ efficient 
driving. It's improving driving style and fuel consumption. Not only on the driving style, there is a discount 
policy on insurance services but also real-time monitoring, maintenance, pathfinding, driving style 
development, and also consumption of fuel [8]. 

The more technology the car is based on, the more intelligent the thieves are. In the modern era, 
various modern techniques are used to steal a car key by the attacker. Vulnerabilities of connected cars will 
increase the auto-theft which is one of the threats [9]. Top-of-the-range vehicles are targeted by thieves who 
simply drive off after bypassing security devices by hacking on-board computers [10]. One technique involves 
breaking into the vehicle and plugging a laptop into the hidden diagnostic socket [10]. Penny [11] introduced 
the man-in-the-middle attack or relay attack, to do this radio signals are passed between two devices. Pekaric 
et al. [12] described other attacks such as GPS spoofing and message injection attacks. BMW Group [13] 
seamlessly integrates mobile devices, smart home technology, and vehicle's intelligent interfaces into a 
complete driver's environment. Even in 2021, they introduced a remote door unlock system through a signal to 
the driver’s door to unlock [13]. The threats being discovered will be realized and the security of connected 
cars will become more important as more cars are connected to the internet. 

Previous researchers introduced biometric authentication as one of the significant tools based on the 
physical characteristics of the driver like a fingerprint, face or voice detection, eye shell scanning, and also 
behavioral characteristics. Recognizing/analyzing the driver’s driving pattern is a salient feature to develop the 
security of a car. Data-mining techniques are widely used by earlier researchers to detect such a novel attack. 
Because each driver has their driving style, data mining is also a prominent method to detect car theft (due to 
unexpected driving styles). As we say that the basis of telemetric data the features of the driver's driving pattern 
are reflected. 

CAN-Bus is likely a nervous system used to allow configuration, data logging, and communication 
among electronic control units (ECU) e.g. ECU is like a part of the body and interconnected through CAN, by 
which information sensed by one part can be shared with another [14]. Up to 70 ECUs have a modern car e.g. 
the engine control unit, airbags, audio system, acceleration, and fuel unit. [15]. 

Normally, multi-sensor data is made up of in vehicle’s CAN data. The in-vehicle CAN data such as 
steering wheel, vehicle speed, engine speed, and amount of fuel. Several researchers previously proposed a 
driver identification method based on in-vehicle CAN-Bus data. But direct connectivity is difficult to get data, 
so on-board diagnostics (ODB-II) is used. (OBD-II, ISO 15765) are a self-diagnostic and reporting capability 
that e.g. mechanics use to identify car issues, OBD-II specifies diagnostic trouble codes (DTCs) and real-time 
data (e.g. speed, revolution per minute (RPM)), which can be recorded via OBD-II loggers from CAN-Bus. 
Though such data is difficult to get, every moment data is passing, we need the parameter identifier (PID) 
number of each specific feature to correctly extract. It is non-public and it is made up based on the company. 
Many authors described the problem of CAN-Bus data for identifying the driver [16], [17]. 

In this paper, we aim to identify driver behavior through telemetric data using machine learning 
algorithms. We analyze the data in terms of training, testing, and validation to get model accuracy that helps 
us with driver identification. 


2. LITERATURE REIVEW 

Wakita et al. [18] uses telemetric data to investigate driver identification and identification accuracy 
decreases by 15% compared to the method. They use the role of non-public parameters in identifying the driver. 
Previous work had been done by using car driving simulated [18] data. Investigated the driver's behavior when 
he follows another car. The features mentioned below are used to observe such as accelerator pedal, car speed, 
brake pedal, and distance to the next car. Gaussian mixture model (GMM) is used to achieve 81% accuracy 
with 12 drivers and 73% of 30 drivers [18]. Zhang ef al. [19] analyzes overtaking style for each driver, uses 
accelerator, and steering data and the accuracy is 85% for about 20 drivers through the hidden Markov model 
(HMM). Some other authors used smartphones to capture driver data. Sensors in the smartphone are- GPS, 
accelerometer, magnetometer, and gyroscope [20]-[22]. This data is used for driver profiling and other tasks. 
An Author used an inertial sensor and algorithm was SVM, k-means methods and got 60% accuracy between 
two drivers. 

In another research, Carfora et al. [8] and Ullah and Kim [16] acquire data from in-vehicle CAN-Bus 
via OBD-I. Azadani and Boukerche [23] uses in-vehicle CAN-Bus sensor data and the accuracy was 94.27% 
for the human computer interaction (HCI)-lab dataset. He uses two datasets naming HCI-lab and HCRL and 
measured the performance. Kwak et al. [9] got 99% accuracy from 51 features of 10 drivers and used decision 
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tree (DT), k-nearest neighbors (kNN), random forest (RF), and multilayer perceptron (MPL). Choi et al. [24] 
find out the driving detection and driver recognition using both gaussian mixture model (GMM) and HMM 
methods, which are used for analyzing vehicle CAN-Bus data. Kedar-Dongakar dan Das [25] recognized the 
driver classification based on the energy optimization of a vehicle. Based on driving style three types of drivers 
are Classified as aggressive, moderate, and conservative. The author considers the following features for his 
research work such as vehicle speed, acceleration, torque, acceleration pedal, steering wheel angle, and brake 
pedal pressure. 

Several kinds of research have been going on neural network and deep learning algorithms for a few 
years back and draw a good impact on driver behavior identification works. Xun et al. [26] introduced 
convolutional neural network (CNN) and got 99% accuracy for 10 drivers. For advanced driver assistance 
systems (ADAS), this attribute can be an efficient factor to ensure the security and protection of the vehicle. 
Additionally, it extends the ADAS capabilities by creating different profiles for the drivers, which helps every 
driver according to his own driving style and improve the ADAS fidelity [27]. 


3. ARCHITECTURE OF THE INTENDENT SYSTEM 

The architecture of the intended system is proposed to identify the authorized driver as shown in 
Figure 1. Modern vehicles are connected to the internet through IEEE 802 standard [28], [29] which transfers 
the driver data. The analysis module analyzes the data. If the driving pattern is not matched with the accredited 
driver then the driver identification cell detects and sends a message to the owner of the vehicle through the 
Wi-Fi module used to send information via the server [30]. 


bd 


owner of the 
vehicle 


Vehicle with CAN-BUS 


Analysis Module Learnign Module 


Figure |. Architecture of the intended system for driver identification 


4. METHOD 
4.1. Dataset preparation 

In this connection we need data of the trips for driver identification. Our model is considered an Ocslab 
driving dataset [31]. This date is used for driver classification and personalization based on pattern analysis. 
KIA motors corporation vehicles in South Korea were performed to collect the data and the experiment has 
been done since July 28, 2015. Total 10 drivers labeled “A” to “J” are included in the trips and cover 23 km 
length, completing two round trips from 8.00 PM to 11.00 PM. Three types (such as city road, freeway, and 
parking lot) of road are there with their own characteristics. There are a total 94,401 records with 51 dimensions 
(51 features) and Table 1 depicts the Ocslab dataset. 

In real driving condition each driver drove their own style, in-vehicle CAN-Bus data were collected 
with OBD-II and CarbbigsP (OBD-II scanner). Not all data is possible to get because there are some limitations 
of OBD-II identifiers and sensors such as it cannot provide body control status or airbag status even wheel 
angle rotation status. OBD-II has a limited set of identifier [32] provided by the manufacturer, Table 2 shows 
some list of parameter IDs (PIDs) of service/mode (Hex) 01. There are 10 diagnostic services described in the 
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latest OBD-II standard SAE J1979 [32]. Few numbers of services are shown in Table 3. Driver’s driving 
statistical features are exposed. Figure 3 shows the time series pattern of the in-vehicle's CAN data in the real- 
time driving situation of drivers A and D, where data fluctuation is visible in RPM. 


Dataset Preparation Attribute Evaluator 
es | Search Method a 
Data Preprocezzing & 
Feature Selection 
Selection Attribute Mode 


Ensemble Classifier Naive Bayes, KNN,SVM, REP T, Logistic Reerazion 


Figure 2. Steps of the proposed system 


Table 1. Driving dataset of Ocslab with type and feature 


Type Features 
Engine Engine torque, Engine coolant temperature, Maximum indicated engine torque, Activation of Air 
Compressor, ...... , Friction torque 
Fuel Long term fuel trim Bank1, Intake air pressure, Accelerator pedal value, ......... , Fuel consumption 
Transmission Transmission oil temperature, Wheel velocity, front, left-hand, Wheel velocity, front, right-hand Wheel 
velocity, rear, left-hand, ...., Torque converter Speed 
RPM 
6000 


Y (Rev/min) 
NR Ww 
Ss 8 
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oO 
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Figure 3. The revolutions per minute (RPM) of driver A and D 
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Table 2. List some OBD-II parameters 


Service/Mode (Hex) _ PID (hex) __ Data byte returned Description Min Value Max Value _ Units 
01 03 2 Fuel system status - - - 
04 1 Calculated engine load 0 100 % 
0c 2 Engine speed 0 16,383.75 rpm 
OD 1 Vehicle speed 0 255 km/h 
68 3 Intake air temperature sensors -40 215 0C 


Table 3. List of some OBD-II services/mode (hex) 


Service/Mode (Hex) Description 
01 Show current data 
02 Show freeze frame data 
09 Request vehicle information 
OA Permanent Diagnostic Trouble Codes 


4.2. Data prepossessing 

There are 51 features used in our work in the dataset. Transform the collected data to our classification 
model for analysis we follow- feature selection, data normalization, and data processing through the sliding 
window technique. Example data show in (1). Where dcolumns correspond to the d variable and N rows 
correspond to N instances. 


xt xt... x} 
2 2 2 

i Xi XP Xi a) 
XN XN. XW 


4.2.1. Feature selection 

We discard the following kind of features from the dataset for ameliorative achievement and accuracy 
of the model. We have considered CoorelationAttributeEval as an attribute evaluator, Ranker is used for the 
search method and also select cross-validation 10 and seed 1 while selecting the attribute mode. 

- Homogeneous feature=Ap, 

- Irrelevant feature=B;, 

= Superfluous feature and=C,. 

- Mostly Correlated feature=D.. 

Engine_torque and correction_of_engine_torque features are identical as well as engine_coolant_ 
temperature is a redundant feature. Hence, the selection of features referred to in [2] was performed, from the 
original dataset of 51 selecting 15 features. Table 4 shows the selected feature with statistical significance of 
mean and standard deviation in (2) and (3) respectively. 


N 
dizi Xi 
N 


ge pom (3) 


4.2.2. Data normalization 

As we see, different scales of data exist in the dataset. So we are to normalize the data according to 
the min-max approach. Normalization is essential for some machine learning algorithms like k-Nearest 
Neighbor (k-NN) and SVM. The normalization formulas for integrating data scales are shown in (4). In Figure 
4 and 5 show the Ocslab data set before and earlier normalization respectively. 


Y= 


(2) 


X—-Xmin 
Xnorm = >, (4) 


Xmax—Xmin 


Here, min means minimum value of a feature and max refers to the maximum value respectively 
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Figure 4. Shows the original dataset of Ocslab 
Table 4. Selected 15 feature with mean and standard deviation 
Feature Vehicle Data Mean Standard Previous work Classifiers 
Type deviation 
Long term fuel trim bank1 Fuel 2.843 1.363 [9] DT, KNN, RF, MLP 
Intake air pressure 36.85 27.95 
Accelerator pedal value 3.719 8.506 [18], [25], [33], GMM, SMG, MM, GMM, 
[34], [35] MLP, SM, FNN, 
Fuel consumption 757 761.13 
Maximum indicated engine Engine 67.5 9.5 
torque 
Engine torque’ 23.15 14.73 [23] SVM, RF, NB, KNN 
Calculated load value 41.30 18.38 
Friction torque 13.7 2.27 
Activation of air 0.89 0.31 
compressor 
Engine coolant temperature Transmission 84.24 6.12 
Transmission oil 80.21 10:5 [9] DT, KNN, RF, MLP 
temperature 
Wheel velocity front left- 30.11 26.48 
hand 
Wheel velocity front right- 29.36 26.22 
hand 
Wheel velocity rear left- 29.20 26.10 
hand 
Torque converter speed’ 1259.15 766.51 [9] DT, KNN, RF, MLP 


Normalized Data(Partial) Of Driver A 


Data Scaling (X) 
Oo 
fea} 


Figure 5. Shows the normalized dataset of the Ocslab 
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4.3. Description of the classifiers 

We have considered supervised machine learning classifiers for performing the metrics named kKNN, 
SVM, logistic regression, and reduced error pruning (REP) tree. The k-NN is an instance-based traditional 
machine learning algorithm. Both classification and regression cases k-NN can be used and select the number 
of neighbors through distance calculation of the query points. As shown in (6) is used to calculate the Euclidian 
distance between two points. 


p=/02,0) — ho? 6) 


Support vector machine (SVM) used for classification and regression problems [36]. The goal is to 
find a hyperplane in an N-dimensional space and separately classify the query data point. There is a decision 
boundary called hyperplane that is used to differentiate the classes. It also creates a margin separator with the 
nearest observations and it performs better if maximizes the margin. The equation represents the loss function 
that indicates maximize the margin. 


C (x, y), where y = f (x) = {0, if y* f(x) =1,1-—y*f(x),else1 (7) 


Logistic regression predicts whether something is true or false. Instead of fitting a line to the data, it 
fits an “S” shaped “logistic function” and the curve goes from 0 to 1. The following equation is used to calculate 
the function, also called the sigmoid function. 


1 
SQ) = Ss (8) 
Naive Bayes is a classifier based on Bayes’ theorem. It assumes that the presence of a particular 
feature in a class is unrelated to the presence of any other feature. Naive Bayes model is easy to build a large 
dataset and outperform with sophisticated classification method. The way Naive Bayes is used to calculate the 
posterior probability, shows in (9). 


P(XIC)P(c) 
P(x) 


P(c|x) = (9) 


Reduced error pruning (REP) Tree is a classification technique, from a given dataset it generates 
decision tree. It is seemed to be the extension of the C4.5 by improving the pruning phase. A distinct pruning 
dataset is used by the method and create multiple trees in different iteration. Finally select the best one. As 
measure, mean squared error is used for prediction the model by the tree [37]. To find the mean squared error 
used (10). 


MSE =—Yi_.(% —¥,)? (10) 


4.4. Performance metrics 

The dataset was represented as XeR***’* and we selected 15 features from the original dataset of 51 
features. The new dataset are express as: Xi=X - X (An+Bj+C,+D,). We have considered supervised machine 
learning classifiers to identify driving behavior. Previous researcher has done some work with the classifiers 
of e.g. Decision Tree (DT), k-NN, random forest (RF), MLP [9], and SVM, RF, k-NN [23]. 

In OBD-II the features which are publicly available and also in the Ocslab dataset, we used for 
preparing confusing metrics. Most of the researchers find accuracy to identify the driver behavior and a few 
number researchers use precision and f-score [2]. 

To measure the performance, we have used four indicators named accuracy, precision, F-Measure, 
and Recall. The computation and evaluation performance of the classifiers have occurred through the confusion 
metric. Once the model is generated then the classifier is tested by using a test dataset to check the model 
accuracy. Precision indicates how close or dispersed the measurement is to each other. It measures the number 
of correct positive predictors made. The recall is a metric that quantifies the number of correct positive 
predictions made out of all positive predictions that could have been made. 

The number of FP’s, FN’s, TP’s, and TN’s cannot be calculated directly from this matrix. The values 
of FP’s, FN’s, TP’s, and TN’s for class i (1 <i <n) are determined as per [38]. 


TP; = aj. (11) 


Indonesian J Elec Eng & Comp Sci, Vol. 30, No. 1, April 2023: 276-288 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 ) 283 


FP, = Yja1, Ge (12) 
jxi 

FN; = Yj=1, Qj: (13) 
jzi 

TN; = Yj=1, Uk=1, Ux (14) 
jei ki 


The final confusion matrix, which has dimension 2x2, comprises the average values of the n confusion 
matrices for all classes. For a binary, i.e. two-class problem, a confusion matrix gives the number of false 
positives (FP’s), false negatives (FN’s), true positives (7P’s), and true negatives (TN’s). From this confusion 
matrix, accuracy, precision, recall and F'1-score are calculated in the following (15)-(18). 


te TP 
Precision = apuep 100% (15) 
Recall = —“— x 100% (16) 
TP+FN 
fiz cage 2x DERISIOD EER x 100% (17) 
precision+recall 
TP+TN 
Accuracy = (rP4FN)+(FP+TN) x 100% (18) 
Table 5. Confusion metrics of several classifiers of driver A and D 
Classifier Accuracy of the model Accuracy by class (Binary) 
Precision F1-Score Recall Class (Driver) 
Naive Bayes 96.15 92.1 95.3 98.8 A 
97.8 98.90 97.9 D 
Logistic Regression 98.12 97.4 97.5 98.1 A 
98.8 98.8 99.0 D 
kNN 99.99 1.00 1.00 1.00 A 
99.00 1.00 1.00 D 
REP Tree 99.95 1.00 99.90 99.90 A 
1.00 1.00 1.00 D 
SVM 99.88 98.99 98.87 98.99 A 
99.0 99.0 99.1 D 
ZeroR (Baseline) 78.54 - - 0.0 A 
78.0 88.8 1.0 D 
AdaBoost (Ensemble) 99.91 1.00 99.9 99.8 A 
1.00 1.00 1.00 D 
Table 6. Confusion metrics of several classifiers of all drivers 
Classifier Accuracy of the model of full dataset Accuracy by class (multi class) 
Precision F1-Score Recall Class (Driver) 
Naive Bayes 29.00% 41.8% 37.4% 33.8% A 
33.4% 20.9% 15.2% D 
KNN 76.35% 95.1% 91.6% 88.3% A 
76.1% 76.1% 76.1% D 
SVM 64.00% 95.5% 96.1% 97.3% A 
50.6% 54.3% 59.6% D 
REP Tree 97.14% 99.5% 99.4% 99.3% A 
96.6% 96.7% 96.6% D 
ZeroR (Baseline) 14.03% - - 0.00 A 
14.0% 24.6% 1.00% D 
AdaBoost (Ensemble) 20.90% 70.9% 79.1% 89.5% A 
15.5% 26.9% 1.00% D 
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Figure 6. Precision using Naive Bayes model of driver A and D 
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Figure 7. Randomly multi class probability prediction using Naive Bayes 
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Figure 8. Binary class probability prediction using Naive Bayes 
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Figure 9. Shows the result of KNN algorithm with difference k value 


5. RESULTS AND DISCUSSION 

To evaluate the generalization of the model we have considered K-fold cross-validation for low bias 
and a modest variance [39]. We have used 10 folds where each fold contains 9 blocks are used for training and 
the remaining group taken as a test data set and obtained the mean performance. 

For the classification of the driver, we have introduced the prominent supervised algorithm named 
Naive Bayes, Logistic Regression, k-NN, REP Tree, and SVM. Table 5 shows the result of all classifiers 
through the confusion metric. Among them, KNN shows 99.99% (highest) and Naive Bayes performs 96.15% 
(lowest) accuracy respectively. Mentionable that we have used 15 features and two drivers among 51 features 
and 10 drivers respectively from the Ocslab dataset. The results of ensemble classifiers where AdaBoost and 
voting are given 99.91% and 60.22% accuracy respectively. 

Again we have calculated all the drivers’ accuracy using the full dataset. Table 6 is representing the 
accuracy of only two drivers (A, D) and it also represents the baseline accuracy of 14.03%. In this research, 
we have figured out the ZeroR algorithm to calculate the baseline. Moreover, Adaboost uses an ensemble 
algorithm and the accuracy is important in this research because the other classifier’s accuracy is better than 
this. The model is statistically significant because the accuracy of k-NN is better than baseline accuracy. 

From the above discussion of Tables 5 and 6, we have recognized that state-of-the-art algorithms 
provide the best accuracy when the driver is less for the Ocslab dataset for public OBD-II in service 01. 
Figure 6 illustrates the precision of an algorithm named Naive Bayes, feature “Accivation_of Air Comprssior” 
points out the height result for drivers A and D. If we calculate all drivers randomly through ZeroR then the 
baseline accuracy is 14.03% whereas the Naive Bayes shows 29.93% accuracy on the full Ocslab dataset. This 
comparison indicates statistical significance. In Figures 7 and 8, there is another statistical importance that 
shows the multi-class and binary class probability prediction respectively. If we precisely classify the driver 
then we must have to consider less number of drivers like Figure 8 shows the good results for driver D than 
the driver D in Figure 7. The accuracy increases of drive D from 0.14% to 0.8%, which is more statistically 
significant. 

Tuning the result through the hyperparameter of k-NN -1, 3, 5, 7 shown in Figure 9 with batch size 
100 and Euclidean distance is used for finding the distance function. Each hyperparameter gives different 
results, the higher the number of the nearest neighbor, the lower the accuracy. In REP Tree uses the size of the 
tree is 1,397 with depth and learning rate are 26 and 0.001 accordingly, obtained 99.95% accuracy for drivers 
A and D. 


6. COMPARATIVE PERFORMANCE ANALYSIS 

There are six datasets and seven work domains are shown in Table 7. This table shows the comparative 
analysis of this work with other works already done before. The height accuracy of 99.99% belongs to this 
work based on the classifier, application domain, and the dataset. The statistically significant is exist because 
of different classes have different results for the same dataset. 
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Table 7. Comparative analysis among this work and the works 


Method/Work done Work domain Dataset No. of Application Classifier Accuracy 
Class Domain 
This work Driver identification Ocslab 2(A&D) Machine kNN 99.99% 
Learning 
Kwak et al. [9] Driver profiling Ocslab 2(A & E) Machine kNN 95.70% 
Learning 
Wakita et al [18], Driver identification Driver signal 276 - GMM 16% 
(25], [31], [34] data 
Azadani and Automobile driver In-Vehicle Data 115 Machine kNN 100% 
Boukerche [23] fingerprinting Drivers) Learning 
Zhang et al. [40] Driver behavior Ocslab 2(B&C) Deep LSTM-15 99.82% 
identification Learning 
Abdennour et al. Driver identification Vehicular data 2(4 Deep LSTM 99.00% 
[27] trace-2 Drivers) Learning 
Choi et al. [24] Classification of Driver Vehicle signal 6 Statistical Hidden Markov 25.00% 
Behavior Model (HMM) 
Ullah and Kim et al. Lightweight driver Ocslab - Deep GRU 98.72% 
[16] behavior identification Learning 
Nishiwaki et al. [34] Driver identificatio Gas and Brake 276 - GMM 76.00% 
pedal reading 
Xun et al. [26] Driver fingerprinting - 10 Deep Leaning CNN 100% 


7. CONCLUSION 

Driver identification is our prime aim by using telemetric data in terms of the best accuracy of the 
classifiers. The CAN-Bus data was collected through OBD-II. Only public PIDs are used for this research work 
because some non-public PIDs (which are hard to identify) are available in OBD-II. Previous researchers also 
use the same PIDs for several classifiers. Logistic regression is a prominent supervised learning. We could not 
build the model by using the full Ocslab dataset, even if it was a 24 hours continuous process. Partial dataset 
with two drivers we have built the model successfully and achieved good accuracy. We have achieved 
statistical significance because the baseline classifier is smaller than the others. Moreover, the accuracy of the 
ensemble and other supervised learning classifiers are almost the same. In kNN classifiers, there is a 
computational complexity e.g. it takes more processing time when we consider the higher nearest neighbor to 
build the model and provide less accuracy, in contrast, we have found height accuracy with the lower nearest 
neighbor and less computational complexity. To identify the driver we need around 100% accuracy. In this 
regard, KNN shows a height accuracy of 99.99% among two drivers with 15 features. Whereas for the full data 
set with 10 drivers the accuracy is 76.36% which is unsuitable for driver identification. We plan to use a 
compound feature for novel research with the Ocslab dataset in the future. 
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